151 research outputs found

    GuideLink: A Corpus Annotation System that Integrates the Management of Annotation Guidelines

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Discriminative application of string similarity methods to chemical and non-chemical names for biomedical abbreviation clustering

    Get PDF
    BACKGROUND: Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific phenotypes. Here we tackle the problem of distinguishing between genetic variations associated to cancer and variations associated to other genetic diseases. RESULTS: We implement a new method based on Support Vector Machines that takes as input the protein variant and the protein function, as described by its associated Gene Ontology terms. Our approach succeeds in discriminating between germline variants that are likely to be cancer-associated from those that are related to other genetic disorders. The method performs with values of 90% accuracy and 0.61 Matthews correlation coefficient on a set comprising 6478 germline variations (16% are cancer-associated) in 592 proteins. The sensitivity and the specificity on the cancer class are 69% and 66%, respectively. Furthermore the method is capable of correctly excluding some 96% of 3392 somatic cancer-associated variations in 1983 proteins not included in the training/testing set. CONCLUSIONS: Here we prove feasible that a large set of cancer associated germline protein variations can be successfully discriminated from those associated to other genetic disorders. This is a step further in the process of protein variant annotation. Scoring largely improves when protein function as encoded by Gene Ontology terms is considered, corroborating the role of protein function as a key feature for a correct annotation of its variations

    Apply of Textmining Method to Study the Roles in Improving the Health by Lactoferrin, a Multi-Functional Milk Protein

    Get PDF
    Lactoferrin is a metal-binding glycoprotein found in milk, blood and other exocrine secretions. This is a multi-functional protein that exhibits many activities such as: anti-microbial, anti-viral, immunomodulatory, anti-inflammatory, anti-tumor, anti-metastatic, cell growth-promoting, and anti-oxidant activities, as well as regulation of granulopoiesis and iron absorption, etc. To date, a number of academic reports concerning the biological activities of lactoferrin have been published and are easily accessible through public databases. In order to overcome the information overload associated with lactoferrin information, we have applied the text mining method to the accumulated lactoferrin literature. To this end, we used the information extraction system GENPAC (provided by Nalapro Technologies Inc., Tokyo), which uses natural language processing and text mining technology. Using GENPAC, text extraction was carried out on literature containing the term “lactoferrin” and any of keywords concerning health conditions or diseases from PubMed. Subsequently, network mappings of the information obtained were produced using Cytoscape. We will exhibit that such textmining method and information visualization analysis is useful in studying novel relationships among a multitude of lactoferrin functions and mechanisms to improve our health

    Cancer gene expression database (CGED): a database for gene expression profiling with accompanying clinical information of human cancer tissues

    Get PDF
    Gene expression profiling of cancer tissues is expected to contribute to our understanding of cancer biology as well as developments of new methods of diagnosis and therapy. Our collaborative efforts in Japan have been mainly focused on solid tumors such as breast, colorectal and hepatocellular cancers. The expression data are obtained by a high-throughput RT–PCR technique, and patients are recruited mainly from a single hospital. In the cancer gene expression database (CGED), the expression and clinical data are presented in a way useful for scientists interested in specific genes or biological functions. The data can be retrieved either by gene identifiers or by functional categories defined by Gene Ontology terms or the Swiss-Prot annotation. Expression patterns of multiple genes, selected by names or similarity search of the patterns, can be compared. Visual presentation of the data with sorting function enables users to easily recognize of relationships between gene expression and clinical parameters. Data for other cancers such as lung and thyroid cancers will be added in the near future. The URL of CGED is http://cged.hgc.jp

    DDBJ launches a new archive database with analytical tools for next-generation sequence data

    Get PDF
    The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1 701 110 entries/1 116 138 614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis Gene Expression tags for human and mouse deposited from the Functional Annotation of the Mammalian cDNA consortium. In this period, we started a novel user announcement service using Really Simple Syndication (RSS) to deliver a list of data released from DDBJ on a daily basis. Comprehensive visualization of a DDBJ release data was attempted by using a word cloud program. Moreover, a new archive for sequencing data from next-generation sequencers, the ‘DDBJ Read Archive’ (DRA), was launched. Concurrently, for read data registered in DRA, a semi-automatic annotation tool called the ‘DDBJ Read Annotation Pipeline’ was released as a preliminary step. The pipeline consists of two parts: basic analysis for reference genome mapping and de novo assembly and high-level analysis of structural and functional annotations. These new services will aid users’ research and provide easier access to DDBJ databases

    The Molecule Role Ontology: An Ontology for Annotation of Signal Transduction Pathway Molecules in the Scientific Literature

    Get PDF
    In general, it is not easy to specify a single sequence identity for each molecule name that appears in a pathway in the scientific literature. A molecule name may stand for concepts of various granularities, from concrete objects such as H-Ras and ERK1 to abstract concepts or categories such as Ras and MAPK. Typically, the relations among molecule names derive a hierarchical structure; without a proper way to handle this knowledge, it becomes ever more difficult to develop a reliable pathway database. This paper describes an ontology that is designed to annotate molecules in the scientific literature on signal transduction pathways
    corecore